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Abstract 

We show how to answer exponentially many queries from multiple analysts on a private database, 
while protecting differential privacy both for the individuals in the database and for the analysts. Our 
mechanism is the first to offer differential privacy on the joint-distribution over analysts' answers, pro- 
viding privacy for data analysts even if the other data analysts may share information or register multiple 
accounts. In some settings, we are able to achieve nearly optimal error rates (even as compared to 
mechanisms which need not offer analyst privacy), and we are able to extend our techniques to give 
mechanisms which answer even non-linear queries. Our analysis is based on viewing and solving the 
private query-release problem as a two-player zero sum game, which may be of independent interest. 

1 Introduction 

Consider a tracking network that collects consumer data and wants to sell its database to several competing 
analysts conducting market research. The administrator of the tracking network faces many possible con- 
straints. For legal reasons, she may want to protect the privacy of the individuals contained in her database. 
She is in the business of selling data, so she must allow the analysts to query the database, and provide accu- 
rate answers to those queries. Finally, she may have to guarantee privacy of the queries made to the database, 
since the analysts are in competition and their queries may be disclosive of their proprietary strategies. 

The question of analyst privacy was recently raised in a beautiful paper of Dwork, Naor, and Vadhan 
[DNV12]. They showed that differentially private stateless mechanisms — which answer each query inde- 
pendently of the other queries that have been asked — can only give accurate answers when the number of 
queries is at most quadratic in the size of the database. This result rules out mechanisms that perfectly 
protect the privacy of the queries, but does not preclude mechanisms that offer a differential-privacy-like 
guarantee with respect to the queries. Indeed, [DNV12] give such a mechanism: their mechanism has the 
guarantee that the marginal distribution on answers given to each analyst are differentially private with re- 
spect to the entire set of queries made by all of the other analysts. Their mechanism is capable of answering 
large numbers of linear queries with error C^l/n 1 / 4 ). A linear query, also referred to as a counting query, 
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is a query of the form "What fraction of the individual records in the database satisfy some property qT 
Here n is the number of records in the database. 

However, as they note, the DNV mechanism has several shortcomings. First among these is that it 
does not promise differential privacy on the. joint distribution over multiple analysts answers. Therefore, if 
multiple analysts collude, or if a single analyst registers several false accounts with the mechanism, then 
the mechanism no longer guarantees query privacy. Second, the mechanism does not necessarily achieve 
optimal accuracy, since it answers queries less accurately than do the best mechanisms without analyst 
privacy. Finally, the [DNV 12] mechanism can only answer linear queries (also referred to as counting 
queries), and their techniques break down when moving to arbitrary low-sensitivity queries. 

In this paper, we address these issues. First we consider mechanisms which guarantee one-query-to- 
many-analyst-privacy. That is, we require that for each analyst a, the joint distribution over answers given 
to all other analysts a' / a should be differentially private with respect to the change of a single query asked 
by analyst a. This privacy guarantee is incomparable to the guarantee of [DNV 12]: on the one hand, it is 
weaker, because we protect the privacy of a single query, rather than protecting the privacy of all queries 
asked by analysts a' ^ a. On the other hand, it is stronger, because the privacy of one query from an analyst 
a is preserved even if all other analysts a' / a collude. Mechanisms with this guarantee are also resistant 
to sybil attacks, in which an analyst registers many different accounts with the database administrator. Our 
first result is a mechanism that offers one-query-to-many-analyst privacy with error at most 0(l/y/n) for 
answering large numbers of linear queries. This error rate is optimal up to polylogarithmic factors. 

We then show how to extend our techniques to one-analyst-to-many-analyst privacy, where we require 
that the mechanism preserves the privacy of analyst a even if he changes all of his queries, even if all analysts 
a' 7^ a collude. Our second result is a mechanism that offers one-analyst-to-many-analyst privacy with error 
0(l/n 1//3 ). Although this error rate is worse than what we achieve for one-query-to-many-analyst privacy 
(and not necessarily optimal), our mechanism is capable of answering exponentially many queries with 
non-trivial accuracy guarantees, while satisfying strong notions of both data and analyst privacy. 

The two mechanisms we just described operate in the non-interactive setting, where the queries from 
every analyst are given to the mechanism in a single batch. Our third result is a mechanism in the online 
setting that satisfies one-query-to-many analyst privacy. The mechanism accurately answers a (possibly 
exponentially long) fixed sequence of low-sensitivity queries. Although our mechanism operates as queries 
arrive online, it cannot tolerate adversarially chosen queries (i.e. it operates in the same regime as the smooth 
multiplicative weights algorithm of Hardt and Rothblum [HR10]). Our mechanism gives answers with error 
at most 0(l/n 2 / 5 ). We are also able to extend this mechanism to answer arbitrary low-sensitivity queries, 
albeit with worse accuracy guarantees. For answering queries with sensitivity 1/n (which, for comparison, 
is the sensitivity of a linear query), the mechanism guarantees error at most 0{l/n l / 1G ). 

We mention that when answering k queries on a database D G X n , our offline algorithms run in time 
0(|A?|) and our online algorithm for linear queries runs in time 0(1^1) per query. Although it would 
be desirable for the mechanism to run in time polylog|A'|, running time of \X\ is essentially optimal for 
mechanisms such as ours that answer more than n 2 arbitrary linear queries [U1112]. 

1.1 Our Techniques 

To prove our results, we take a novel view of private query release, which may be of independent interest. 
Consider a two player zero-sum game played between a query player and a data player. For each query 
of interest 9 £ Q, the query player has two actions: a q and a-, q . For each element of the data universe 
x G X, the data player has one action, a x . The cost matrix G is defined so that G(a q , x) = q{x) — q{D) 
and G(a-, q , x) = q(D) — q(x), where D is the private database. The query player wishes to maximize the 
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cost, whereas the database player wishes to minimize the cost. We show that the value of this game is 0, and 
that any p-approximate equilibrium strategy for the database player corresponds to a database that answers 
every query q 6 Q correctly up to additive error p. 

The standard problem of private linear query release corresponds to privately computing an p-approximate 
equilibrium strategy to the above game, where privacy is preserved with respect to changing every entry in 
the game matrix by at most 1/n. The problem of query release while protecting one-query-to-many-analyst 
privacy then corresponds to computing an p-approximate equilibrium strategy to the above game, where 
privacy is with respect to an arbitrary change in a single row of the game matrix, corresponding to a single 
strategy of the query player. Our main result can therefore be viewed as an algorithm for privately comput- 
ing the equilibrium of a zero sum game while protecting the privacy of individual strategies of the players, 
which may be of independent interest. 

It is well-known that when two no-regret algorithms are played against each other in a two player zero- 
sum game, that their empirical play distributions quickly converge to an equilibrium. We use this to construct 
a query release mechanism. That is, we attempt to compute an equilibrium of G by letting the query player 
and the data player play against each other using no-regret algorithms, and finally outputting the empirical 
play distribution of the data player. We face several obstacles. 

First, no-regret algorithms maintain a state — roughly, a distribution over actions — which is not itself 
privacy preserving. (In fact, it is computed deterministically from inputs that may depend on the data 
or queries.) Previous approaches have addressed this problem by adding noise to the inputs of the no- 
regret algorithm. We take a different approach, and use the fact that actions sampled from the distributions 
maintained by the multiplicative weights algorithm are privacy preserving. The property that samples from 
the multiplicative weights distribution follows from the fact that these distributions are in fact a form of 
the private exponential mechanism. Note that this property is not used in the private multiplicative weights 
mechanism of Hardt and Rothblum [HR10], who use the distribution itself as a hypothesis. Indeed, without 
the constraint of query privacy, any no-regret algorithm can be used in place of multiplicative weights 
[RR10, GRU12], which is not the case in our setting. 

Second, for query privacy, we require that during the simulated play of the zero-sum game, the query 
player never plays a mixed strategy which is too concentrated on any single query. This ensures that the 
loss experienced at each round by the data player is insensitive in the change of any single query. To ensure 
this, we force the query player to play mixed strategies only among the set of smooth distributions, that 
do not place too much weight on any single action. We accomplish this using Bregman projections. It is 
well-known that multiplicative weights coupled with a Bregman projection into a convex set K achieves 
no-regret to any strategy in K. 

The result of this simulation is an approximate equilibrium strategy for the data player in the sense that 
it achieves approximately the value of the game when played against all but s strategies of the query player, 
where s is the maximum probability that the query player may assign to any single query. This corresponds 
to a synthetic database that answers all-hut s queries asked by any of the data analysts. We can release this 
data structure to all analysts. Finally, because s will be small, we can answer the queries of all data analysts 
while preserving query privacy by using the sparse vector technique [DNR+09, RR10, HR10] paying only 
an additional factor of y/s in the privacy parameter for the s queries that were not correctly answered by the 
synthetic data. The result is a nearly-optimal error rate of 0{\/y/n) 

Our techniques naturally extend to one-analyst-to-many-analyst privacy by making the strategies of the 
query player correspond to entire workloads of queries, one for each analyst. We can also use this ap- 
proach to convert the private multiplicative weights algorithm of Hardt and Rothblum [HR10] into an online 
algorithm that preserves one-query-to-many-analyst privacy, and can also answer arbitrary low-sensitivity 
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queries. These extensions both give first-of-their-kind results, but at some degradation in the accuracy pa- 
rameters: here we do not obtain 0(l/yfn) error rate. We leave it as an open problem to achieve 0(l/y/n) 
error for these types of mechanisms, or show that the reduction of accuracy is inherent. 

1.2 Related Work 

There is an extremely large body of work on differential privacy [DMNS06] that we do not attempt to survey. 
We here summarize only the most related work. The study of differential privacy was initiated by a line of 
work [DN03, BDMN05, DMNS06] culminating in the definition in Dwork, Mcsherry, Nissim, and Smith 
[DMNS06], who also introduced the basic technique of answering low-sensitivity queries using the Laplace 
mechanism. This allows nearly linearly many queries in the database size to be usefully answered while 
preserving differential privacy. 

A recent line of work [BLR08, DNR+09, DRV10, RR10, HR10, GHRU1 1, GRU12, HLM12] has shown 
how to accurately answer nearly exponentially many queries usefully while preserving differential privacy of 
the data. Some of this work [RR10, HR10, GRU12] has achieved these results by using no-regret algorithms. 
Notably, Dwork Rothblum and Vadhan [DRV 10] introduce extremely useful composition theorems, and 
Hardt and Rothblum [HR10] introduced the multiplicative weights technique to the differential privacy 
literature - both of which we use crucially in this paper. In our work, we make use of the multiplicative 
weights algorithm in a slightly different way in which it has been used before. Here, we simulate play of 
a 2-player zero-sum game using two copies of the multiplicative weights algorithm, and rely on the fast 
convergence of such play to Nash equilibrium [FS96]. We also rely on the fact that Bregman projections 
onto a convex set K can be used in conjunction with the multiplicative weights update rule 1 to achieve no 
regret with respect to the best element in the set K [RSI 2]. Finally, we rely on the fact that samples from the 
multiplicative weights distribution can be viewed as samples from the exponential mechanism of Mcsherry 
and Talwar [MT07], and hence are privacy preserving. 

Our use of Bregman projections into smooth distributions is similar to its use in smooth boosting. No- 
tably, Barak, Hardt, and Kale [BHK09] use Bregman projections in a similar way, and the weight capping 
used by Dwork, Rothblum, and Vadhan [DRV 10] in their analysis of boosting for people can be viewed as 
a Bregman projection. 

The most closely related paper to ours is the beautiful recent work of Dwork, Naor, and Vadhan [DNV 12], 
who introduce the idea of analyst privacy. This paper shows that any algorithm which can answer w(n 2 ) 
queries to non-trivial accuracy requires that it maintain common state as it interacts with many data analysts, 
and hence potentially violates the privacy of the analysts. They give a mechanism which promises many- 
to-one analyst privacy, and achieves per-query error 0(n 3 / 4 ) for linear queries. That is, their mechanism 
promises differential privacy on the marginal distribution of answers given to any single analyst, even when 
all other analysts change all of their queries. However, if multiple analysts collude, or if a single analyst can 
falsely register under many ids, then the privacy guarantees degrade quickly, because privacy is not promised 
on the. joint distribution on all analysts answers. Removing this limitation, as well as improving the error 
bounds, and extending analyst privacy to non-linear queries are all stated as open questions. In this work, 
we make progress on all of these questions. We achieve nearly-optimal error bounds of 0(n 1//2 ) for answer- 
ing linear queries under the constraint of one-query-to-many-analyst privacy, which promises differential 
privacy on the joint distribution over answers given to all analysts, when a single query has changed. We 
also show how to extend our techniques to get one-analyst-to-many-analyst privacy, which allows a single 
analyst to change all of her queries, and promises privacy even if all other analysts collude. We also show 

'indeed, they can be used in conjunction with any no-regret algorithm in the family of regularized empirical risk minimizers. 
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how to extend our techniques to answer arbitrary low-sensitivity queries. 

2 Preliminaries 

2.1 Differential Privacy and Analyst Differential Privacy 

Let a database D G X n be a collection of n rows x^ l \ . . . , x^ n ' from a dafa universe X. We say that two 
databases D,D' G <Y n are adjacent if they differ only on a single row, and we denote this by D ~ L>'. 

A mechanism A : X n — > 1Z takes a database as input and outputs some data structure in 1Z. We are 
interested in mechanisms that satisfy differential privacy. 

Definition 2.1 (Differential Privacy [DMNS06]). A mechanism A : X n — > 1Z is (e, 5) -differentially private 
if for every two adjacent databases D ~ D' G Af n and every subset S C 7£, 

Pr [.4(D) 6 5]< e E Pr [.A(-D') G S] + <5. 

In this work we construct mechanisms that ensure differential privacy for the analyst as well as for 
the database. To define analyst privacy, we need to define many-analyst mechanisms. Let Q be the set 
of all allowable queries. The mechanism will take m sets of queries Qi, . . . , Q m and return m outputs 
Z\, . . . , Z m , where Zj should be thought of as answers to the queries Qj. Thus, a many-analyst mechanism 
has the syntax A : X n x (Q*) m -> K m . Given sets of queries Q 1 , . . . , Q m , we use Q = \J™ =1 Qj to denote 
the set of all queries. Because we wish to guarantee privacy even in the event of collusion, it will be useful 
to refer to the output given to all analysts other than some analyst i. For each id G [m] we write A(D, Q)_;d 
to denote (Z\, . . . , Z^-i, -^id+i, • • • > Z m ), the output given to all analysts other than id. 

Let Q = Qi, . . . , Qm and Q' = Q' x , . . . , Q' m . We say that Q and Q! are analyst-adjacent if there exists 
id* G [m] such that for every id ^ id*, Qid = Q' id . That is, Q ~ Q' are analyst adjacent if they differ only 
on the queries asked by one analyst. Intuitively, we say that a mechanism satisfies one-analyst-to-many- 
analyst privacy if changing all the queries asked by analyst id* does not significantly affect the output given 
to all analysts other than id* . 

Definition 2.2 (One- Analyst-to-Many- Analyst Privacy). A many-analyst mechanism A satisfies (e, 5) -one- 
analyst-to-many-analyst privacy if for every database D G X n , every two analyst-adjacent query sequences 
Q ~ Q' that differ only on one set of queries Qid, Q[ d , and every S C IZ" 1 ^ 1 , 

Pr [A(D, Q)_ id 6 5]< e £ Pr [A(D, Q')-id G 5] + S. 

Let Q = Qi, . . . , Q m and Q' = Q'±, ■ ■ . , Q! m . We say that Q and Q! are query-adjacent if there exists 
id* such that for every id / id*, Q;d = Q- d and |Q id AQ- d | < 1. That is, Q ~ Q! are query adjacent if they 
differ only on one of the queries asked. Intuitively, we say that a mechanism satisfies one-query-to-many- 
analyst privacy if changing one query asked by analyst id* does not significantly affect the output given to 
all analysts other than id*. 

Definition 2.3 (One-Query-to-Many-Analyst Privacy). A many-analyst mechanism A satisfies (e,5)-one- 
query-to-many-analyst privacy if for every database D G X n , every two query- adjacent query sequences 
Q ~ Q! that differ only on one query in Q;d, Q[ d , and every S C 1Z m ~ x , 

Pr [A(D, Q)_ id G S] < e £ Px [A(D, Q')-id G S] + 5. 
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In our proofs of both differential privacy and analyst privacy, we will often establish that two distribu- 
tions M(D), M(D') (for any two adjacent inputs D ~ D') are such that with probability at least 1 — 5 over 

y i r M(D), 

Pr[M(£>) = y] 



In 



< £. 



Vr[M{D') = y] 

We note that this condition is no weaker than (e, 5) -differential privacy [DRV 10]. 



2.2 Queries and Accuracy 

Since a sanitizer that always outputs _L satisfies Definition 2.1, we also need to define what it means for a 
sanitizer to be accurate. In this work we consider two types of queries: low-sensitivity queries and linear 
queries. Low- sensitivity queries are parameterized by A G [0, 1]. A A-sensitive query is any function 
q : X n -»• [0, 1] such that 

max \q(D) - q(D')\ < A. 

A linear query is a particular type of low-sensitivity query, specified by a function q : X — > [0,1]. We define 
the evaluation of the query q on a database D G X n to be 

1 71 

i=l 

A linear query is (l/n)-sensitive. 

Since A may output an arbitrary data structure, we must specify how to answer queries in Q from the 
output A(D). Hence, we require that there is an evaluator £ : 1Z x Q — > E that estimates q(D) from the 
output of A{D). For example, if ^4 outputs a vector of "noisy answers" Z = (q(D) + Z g ) ge Q, where Z 9 is 
a random variable for each q G Q, then 7£ = M s and £ (Z, g) is the q-th component of Z. Abusing notation, 
we write q(Z) and q(A(D)) as shorthand for £(Z, q) and £(A(D),q), respectively. 

Definition 2.4 (Accuracy). An output Z of a mechanism -4(-D) is a-accurate for the query set Q if |g(Z) — 
q(D)\ < a for every q G Q. A mechanism is (a, f3)-accurate for the query set Q if for every database D, 

Pr [Vq G Q, |g(.A0D)) - q(D)\ <<*]>!-?, 

where the probability is taken over the coins of A. 



2.3 Differential Privacy Tools 

We will make use of a few previously known differentially private mechanisms. When we need to answer 
a small number of queries we will use the well-known Laplace mechanism [DMNS06], with an improved 
analysis from [DRV 10]. 

Lemma 2.5 (Laplace Mechanism). Let F = ... , /m}, /j : X n — > [0, 1], be a set of A-sensitive 
queries, and let D G X n be a database. Let e, 8 < 1. Then the mechanism Aha,p{D, J~) that outputs 

for every fi G T 
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1. Is (e, 5) -differentially private, and 

2. is (a, ^-accurate for any (3 G (0, 1] and a = e^Ay^J 7 ! log(l/<5) logdJ 7 ]//?). 

When we need to answer a large number of queries, we will use the multiplicative weights mechanism 
from [HR10], with an improved analysis from Gupta et al. [GRU12]. 

Lemma 2.6 (Multiplicative Weights Mechanism). Let T = {/i, . . . , f\j\\, fi ■ X n — > [0, 1], be a set of 

A.- sensitive functions. Let D G X n be a database. Then the multiplicative weights mechanism Amw(D, J 7 ) 
is 

1. (e, 5) -differentially private, and 

2. (a, P) -accurate for any (3 G (0, 1] and a = 0(log 1/4 \X\ a/F^A log(|.F|//3) log(l/<$)). 

Our algorithms make use of the private sparse vector algorithm. The algorithm takes as input a database 
and a large set of low- sensitivity queries, with the promise that only a small number of those queries have 
large answer on the input database. Its output is a set of queries that contains only those whose answer is 
large on the input database. We also note that in this setting, the sparse vector algorithm ensures the privacy 
of the input queries in a strong sense. 

Lemma 2.7 ("Offline Sparse Vector Mechanism" [HR10, Rotll]). Let T = {/i, . . . , f\?\}, fi : X n -> 
[0, 1], be a set of A-sensitive functions. Let D G X n be a database, a G (0, 1], k G [(J 7 !] such that 

I {i | fi(D) > a} | < k 

Then there is an algorithm Asv(D, J 7 ) that 

1. is (e, 5) -differentially private with respect to D, 

2. returns JC [\J-\] of size at most k such that with probability at least 1 — f3, 

{i I fi(D) >a + e- 1 AV8fclog(l/£)log(|Jl//3)} C I C {i \ f t {D) > a} , 

3. and is perfectly private with respect to T — if T' = I fx, . . ■ , /j, . . . , fu \> then for every D and i ^ j, 

Pr [i G Asv(D,F)] = Pr [i G A S y(D,T')] . 

We also will use the Composition Theorem of Dwork, Rothblum, and Vadhan [DRV 10]. 

Lemma 2.8 (T-Fold Adaptive Composition). Let A : X* — > 1Z T be a mechanism such that for every pair 
of adjacent inputs x ~ x', every t G [T], every n, . . . , rt-i G 1Z, and every rt G 1Z, 

Pr [A t (x) = r t | Ai,... t t-i(x) =n,.. . ,r t _i] 
< e e °Pr [At(x') = r t \ Ai,..., t -i{x') = n, . . . , r t -i] + 5 

for 8q < 1 /2. Then A is (e, 5)-differentially private for £^/8T log(l/5) + 2cqT and 5 = 5qT. 
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2.4 Multiplicative Weights 

Let A : A — > [0, 1] be a measure over a set of actions A. We use |A| = Yl a &A ^( a ) to denote the density of 
A. A measure naturally corresponds to a probability distribution A in which 



Pr 



A 



A(a)/\A\ 



for every a £ A. Throughout, we will use calligraphic letters (A) to denote set of actions, lower case letters 
(a) to denote the actions, capital letters (A) to denote measures over actions, and capital letters with a tilde 
to denote the corresponding distribution (A). 

We will use the KL-divergence between two distributions, defined to be 

KL{A\\A!) = Ma) log (A(a)/A>(a)) 

a<=A 

Let L : A — > [0, 1] be a loss function (losses L). Abusing notation, we can define L(A) = E L{A) 
Given an initial measure A\, we can define the multiplicative weights algortihm as follows: 

Algorithm 1 The Multiplicative Weights Algorithm, MW^ 
Fori = 1,2,..., T: 
Sample at <— R A t 

Receive losses Lt, which may depend on Ai, oi, . . . , Af-%, at-i 
Update: Let A t+ i be such that A t+1 (a) = e~ vLt ^A t (a) for every a £ A 



The following theorem about the multiplicative weights update is well-known. 

Theorem 2.9 (Multiplicative Weights. See e.g. [RSI 2]). Let A\ be the uniform measure of density 1, and 
let {oi, . . . , aj-} be the actions obtained by MW V with losses {L±, . . . , Lt}. Let A* = \ a=a *, for some 
a* £ A, and 5 £ (0, 1]. Then with probability at least 1 — j3, 

E JW <(! + „> E M ^]+ i ^™ + 4k ^M 
t<- R [T] t*- R [T] rjT 

t<—R\r] rjT y/T 

We need to work with a variant of multiplicative weights that only produces measures A of high density, 
which will imply that A does not assign too much probability to any single element of A. To this end, 
we will apply (a special case of) the Bregman projection to the measures obtained from the multiplicative 
weights update rule, which we now define. 

Definition 2.10 (Bregman Projection). Let s £ (0,14] be a density. Given a measure A such that \A\ < s, 
the (Bregman) projection of A into the set of density- s measures is the measure T S A obtained by computing 
c> I such that s = J2aeA m i n {l> cA(a)} and setting TA(a) = min{l, cM(a)} for every a £ A. 

Given an initial measure A\ such that | A\ \ < s, we can define the dense multiplicative weights algorithm 
as follows: 

Note that we update the unprojected measure At, but sample at using the projected measure T s At. 
Observe that the update step can only decrease the density, so we will indeed have \A t \ < s for every t. 
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Algorithm 2 The Dense Multiplicative Weights Algorithm, DMW S ^ 
For t = 1,2, . . . ,T: 

Let = r s A t , and sample at ^— R A£ 

Receive losses L t , which may depend on A\, a±, . . . , A t -i, a t -i 
Update: Let A t+1 be such that A t+1 (a) = e~ r i L *( a ) A t (a) for every a£ A 



Similarly, given a sequence of losses {L\, . . . , Lt}, and an initial measure A\ of density s, we can consider 
the sequence {A\, . . . , At} where A t +\ is given by the projected multiplicative weights update applied to 
A t ,L t . The following theorem is known. 

Theorem 2.11 (Multiplicative Weights with Projections. See e.g. [RS12]). Let A\ be the uniform measure of 
density 1 and let {a\, . . . , a^} be the sequence of measures obtained by DMW S) -q with losses {L\, . . . , Lt}- 
Let A* = l a£ s*far some set S* C A of size s, and 8 6 (0, 1]. Then with probability 1 — (3, 

( ^ [ Mr.,,,,( 1 + ,) t ^ lW ..), + ™ + ^ 



t*- R [T] L 7]T 

2.5 Regret Minimization and Two-Player Zero-Sum Games 

Let G : Ar x Ac — > [0, 1] be a two-player zero-sum game between players (i?)ow and (C)olumn. The 
players take actions r G Ar and c G and player R receives loss G(r, c) while player C receives 
loss —G(r, c). Let A(Ar), A(Ac) be the set of measures over actions in Ar and Ac, respectively. The 
well-known minimax theorem states that 

min max G(R,C) = max min G(R,C):=v 
R£A(A R )C£A(Ac) CgA(Ac) R&A(A r ) 

where v is defined to be the value of the game. 

Freund and Schapire [FS96] showed that if two sequences of actions {r%, . . . , r^} , {ci, . . . , ct} are 
"no-regret with respect to one another", then r = h Ylt=i r t an d c = i Ylt=i °t form an approximate 
equilibrium strategy pair. More formally, if 

E [G(r t , c t )} < min E [G(r, c t )} +p and E [G(r t ,c t )} > max E [G(r t , c)] - p 

then 

v-2p< G{7,c) <v + 2p. 

As a consequence, if Row chooses actions using the multiplicative weights rule with losses L t (r t ) = 
G(rt, ct) and Column chooses actions using the multiplicative weights rule with losses Lt(rt) = —G(rt, ct), 
then each player's distribution on actions is converging to a minimax strategy. That is, if we play for suffi- 
ciently many rounds so that both players have regret at most p: 

max G(r, c) < v + 2p v — 2p < min G(r, c) 

c r 

In our view of query release as a two player game, the row player has actions corresponding to possible 
databases, and the column player has actions corresponding to queries. Row aims to minimize error on the 
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query played by Column, and Column tries to find a query that maximizes error. Intuitively, to preserve 
query privacy, Column must not put too much weight on any single query. 

Thus, we need an analogue of this result in the case where Column is not choosing actions according to 
the multiplicative weights update, but rather using the projected multiplicative weights update. In this case 
we cannot hope to obtain an approximate minimax strategy, since Column cannot play any single action 
with significant probability. However, we can define an alternative notion of the value of a game in which 
Column is not required to play any single action too often. Specifically, let A s (Ac) be the set of measures 
over Ac of minimum density at least s. We define 

v s := min max G(R, C) 

ReA(A R ) CeA 3 (A c ) 

Notice that v s < v, but v s can be very different from v. 

Theorem 2.12. Let {n, . . . , r^} £ Ar be a sequence of row-player actions. Let {C\, . . . , Ct} £ A s (Ac) 
be a sequence of high-density measures over column-player actions, and {c±, . . . , ct} £ Ac be a sequence 
of column-player actions such that Cj <— R Cj for every t £ [T]. Further, suppose that 

E[G(r t ,ct)]< min E [G(R, c t )l + p and E[G(r t ,c t )]> max E [G(r t ,C)] — p. 
t ReA(A R ) t t C^A S {A C ) * 

Then 

v s -2p< G(r,c) <v + 2p. 
Moreover, r is an approximate min-max strategy with respect to strategies in A s (Ac) 

v s -2p< max G(r, C) < v + 2p 
CeA s (A c ) 

Proof. For the first set of inequalities, we handle each part separately. For one direction 

v s = min max G(R, C) 

R£A(A R )CeA s (A c ) 

< max E[G(r t ,C)] < E [G(r t , *)] + p 
CeA s (Ac) * < 

< min E [G(R, c t )l + 2p = min G(R,c) + 2p 
~ R£A(A R ) t r ReA(A R ) 

<G(¥,Z) + 2p 

The other direction is similar, starting with the fact that v = max ce c min rG ^ G(r, c). 

For the second set of inequalities, we also handle the two cases separately. For the upper bound 

max E[G(r,C)]<E[G(r t ,ct)]+ P 
CeA s (A c ) t t 

< min E [G(R, c t )] + 2p= min G(R, 'cl + 2p 
ReA(A R ) t r ReA(A R ) 

<v + 2p 

max G(r, C) > E \G(r,c)] > v s - 2p 
CeA s (Ac) t 

This completes the proof of the theorem. □ 
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Corollary 2.13. Let G : Ar x Ac — > [0, 1]. If the row player chooses actions {n, . . . , r^} by running 
MW„ with loss functions Lt(r) = G(r, a) and the column player chooses actions {a, . . . , ct} by running 
DMW s . rj with the loss functions L t (c) = —G(r t , c), then with probability at least 1 — (3, 



3 One-Query-To-Many-Analyst Privacy 

3.1 An Offline Mechanism for Counting Queries 

Now we can define our offline mechanisms for releasing linear queries. 

Algorithm 3 Offline Mechanism for Counting Queries w/ One-Query-to-Many- Analyst Privacy 
Input: Database D G X n and sets of linear queries Qi, . . . , Q m . 

Initialize: Let Q = \J" l =1 QjU^Qj. Let D = 1/\X\ for every x € X,Q (q) = 1/|Q| for every q G Q, 



v s — 2/> < max G(r, c) < v + 2p 



for 



P = V + 




r = n-max{log|Af|,log|Q|} 



£ 



s = 12T 



V = 



2y/Tlog(l/6) 



DataPlayer: 

On input a query qt 




Choose xt ^ r D t and send x t to QueryPlayer 



QueryPlayer: 

On input a data element xt 



Let Q t+ i(q) = Qt{q) ■ exp (-7? ( 1+<?( ^~ g($t) )) for g G Q and let P t+1 = T s Q t+1 



Choose qt+i ^— R Pt+i and send 5t + i to DataPlayer 



GenerateSynopsis: 

Let D = (xi, . . . , xt) 

Run sparse vector on D to obtain a set of at most s queries Q f ina i 

Using the Laplace Mechanism, obtain an answer a q for each q G Q fi na i 

Output D to everyone and for each q G Q final output (q, a q ) to the analyst that issued q. 



3.1.1 Accuracy Analysis 



Theorem 3.1. The offline algorithm for linear queries is (a, f3) -accurate for 
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Proof. Observe that the algorithm is computing an approximate equilibrium of the game Go(x,q) = 
i+q(D)-q(x) ^ j^ et v ^ Vg ^ e {jjg va i ue anc j constrained value of this game, respectively. First, we pin down 
the quantities v and v s . 

Claim 3.2. For every D, the value and constrained value of Go is 1/2. 

Proof of Claim 3.2. It's clear that the value (and hence constrained value) is at most 1/2, because 

l + q(D)-q(x) l + q(D)-q(D) 1 

mm max < max = - . 

x q 2 q 2 2 

Suppose wechoose x such that (l+q(D) — q(x))/2 < l/2forsomeg G Q. Then, since the query q' = 1 — q 
is also in Q, (1 + q'(D) - q'{x))/2 > 1/2. But then max (?eQ (l + q(D) - q{x))/2 > 1/2, so the value of 
the game is at least 1/2. 

For the constrained value, suppose we choose x such that Eq^_ R Q [(1 + q(D) — q(x))/2] < l/2forsome 
Q G Q s . Then we can flip every query in Q to get a new distribution Q' such that E q ^ R Qt [(1 + q(D) — q(x))/2] > 
1/2. So v s > 1/2 as well. □ 

Letting D = y Ylt=i x t- By Corollary 2.13, 



v s — 2p < max E 
QeA s (Q) \ q ^ R Q 

Applying Claim 3.2 and rearranging terms, we have 



l + q(D)-q(D) 



<v + 2p. 



max E 
QeA s (Q) \ q <_ K Q 



q(D) - q(D) 



max E 
QeA s (Q) \g<- R Q 



q(D) - q(D) 



< 4p 



/ max{log|Af|,log|Q|} 41og(2/^)- 
4 \rj H — h 



Vlog(|^| + |Q|)log(l/5) + log(l//3) ' 



E\ n 



D 



The previous statement suffices to show that \q(D) — q(D)\ < for ail-but s queries. Otherwise, the 
uniform distribution over the queries for which the error bound of does not hold would be a distribution 
over queries, contained in A S (Q), with expected error larger than ag. 

Since there are at most s, such queries we can run the sparse vector algorithm (Lemma 2.7), and, with 
probability at least 1 — /3/3, it will identify every query q with error larger than + asv for 



«sv = O 



^log(l/5)log(|Q|/^) 



en 



There are at most s such queries. Thus, with probability at least 1— /3/3, the Laplace mechanism (Lemma 2.7) 
answers these < s queries to within error 



"Lap = O 



Vslog(l/<5)log(s//3)' 



Ell 
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Now, observe that in the final output, there can be two ways that a query can be answered, either it is 
answered with D, in which case its answer can have error as large as + asv» or it can be answered with 
the Laplace mechanism, in which case its answer can have error as large as ctLap- Thus, with probability at 
least 1 — P, every query has error at most max{a^ + asv> «Lap}- Substituting our choice of s = 12T = 
0(n log(| X | + | Q\)) and simplfiying, we conclude that the mechanism is (a, /3)-accurate for 



a 



O 



v/logdA-l + |Q|)log(l/<?)log(|Q|/^) ^ 



□ 



3.1.2 Data Privacy 

Theorem 3.3. Algorithm 1 satisfies (e, 5) -differential privacy for the data. 

Before proving the theorem, we will state a useful lemma about the Bregman projection onto the set of 
high density measures (Definition 2.10). 

Lemma 3.4 (Projection preserves differential privacy). Let Aq, A\ : A — > [0, 1] be two full-support mea- 
sures over a set of actions A and s G (0, |^4|) be such that 1) \ Aq\, \A\\ < s and 2) | m(Ao(a) /A\{a))\ < e 
for every a G A Let A' = T S A and A[ = F 8 Ai. Then \ ln(A' (a)/A[(a))\ < 2efor every a € A 

Proof of Lemma 3.4. Recall that to compute A' = T S A, we find a "scaling factor" c > 1 such that 2~2 a &A m i n {l> cA(a)} 
s and set A' (a) = min{l, cA(a)}. Let Co and c\ be the scaling factors for A' and A\ respectively. Assume 
without loss of generality that cq < c\. First, observe that 



In 



min{l,coA)(a)} 
min{l,c Ai(a)} 



< 



In 



A (a) 
Ma) 



< £ 



for every a G A. Second, we observe that c\/cq < e e . If this were not the case, then we'd have c\Ai{a) > 
coAi(a)e e > coA)(°0 f° r every a G A, with strict inequality for at least one a. But then we'd have 

min{l, ciAi(a)} > min{l, coAo(a)} = s, 



which would contradict the choice of c\. Thus, 



In 



min{l,c OJ 4o(a)} 



< 


In ^ 







'min{l,c ^o(a)} 



min{l, c\Ai(a)} 
for every a G A. 

Now we prove the main result of this section. 



min{l,c 0J 4i(a)} 



+ 



In 



< e + £ = 2e 



□ 



Proof of Theorem 3.3. We focus on analyzing the privacy properties of the output D = (xi, . . . , xt), the 
privacy of the final stage of the mechanism will follow from standard arguments in differential privacy. We 
will actually show the stronger guarantee that the sequence v = (xi,qi, . . . , xt, qr) is differentially private 
for the data. Fix a pair of adjacent databases Dq ~ D\ and let Vq, V\ denote the distribution on sequences 
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v when the mechanism is run on database Dq, D\ respectively. We will show that with probability at least 
1 - 5/3 over v = (xi, q\, . . . , x T , qr) <-r Vq, 



In 



Vo(v) 
Vi{v) 



< e/3 



which is no weaker than (e/3, 5/3) -differential privacy. To do so, we analyze the privacy of each element of 
v, xt or q t , and apply the composition analysis of Dwork, Rothblum, and Vadhan [DRV 10]. We will bound 
the privacy loss of x t and q\ separately. Define £q = 2rjT/n. 



Claim 3.5. For every v, and every t € [T], 

, f Vojxt | x\,qi, . . .,x t -i,qt-i) 
\Vi(x t | xi,qi, . . .,x t -i,q t -i) 

Proof of Claim 3.5. We can prove the statement by direct calculation 



<£o 



Vojxt | xi,qi, . . .,xt-i,qt-i)' 
Vi(x t | xi, q\, . . .,x t -i,qt-i) 4 

' t-i 

\y2 1 + Qj( D o) - Qj(x t ) 



In 



exp (-(77/2) Y.)=i 1 + Qj( D o) ~ Qj@t) 
exp (-(rj/2) Y!~=\ 1 + %(Di) - 



t-i 



£l + $(Z>i)-$(xt) 



t-i 

3=1 



y(t-l) < VT < 
2n ~ 2n ~ 



□ 



Claim 3.6. For every v, and every t £ [T], 

V (qt I X!,q\, 



In 



Vi(% I xi,gi, . . . ,x t , 



<e 



Proof of Claim 3.6. The sample q t is made according to P t , which is the distribution corresponding to the 
projected measure P t . First we'll look at the unprojected measure Q t and observe that, for any database D, 
and query q we have, 

/ 

Qt(q)=exp \-( V /2)J2l + q(D)-q(xi 

Thus, if Qo(q) is the measure we'd have when database Dq is the input, and Q\ (q) is the measure we'd have 
when database D\ is the input, then 



In 



)(?) 



< 



V 



t-i 



Y,qj(Do)-qj(D 1 ] 
3=1 



< 



rfT 
2~n 



for every q £ Q Given that Qq and Q\ satisfy this condition, Lemma 3.4 guarantees that the projected 
measures satisfy 



In 



Po(Q) 
Pl(Q) 



< 



rfT 



n 
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Finally, we note that if the above condition is satisfied for every q E Q, then the distributions Pq, Pi satisfy 



Jm) 



2riT 

< — < eo, 
n 



because the value of the normalizer also changes by at most a multiplicative factor of e ±r ? T / n . 

To complete the proof, we observe that Vb(qt | x\,q\, . . . ,x t ) = Pb(qt) for 6 € {0, 1}. This completes 
the proof of the claim. □ 

Given these two claims, the composition lemma (Lemma 2.8) (for 2T-fold composition) guarantees that 
with probability at least 1 — 5/3, 



In 



Vo(v) 



<eoV / 4Tlog(3/ ( 5) + 4e2r, 



which is at most e/3 by our choice of £q. This implies that D is (e/3, <5/3)-differentially private. 

To complete the proof, we note that the sparse vector computation to find the s queries with large error 
is (e/3, 5/3) -differentially private, by our choice of parameters (Lemma 2.7), and the answers to the queries 
found by sparse vector are (e/3, (5/ 3) -differentially private for our choice of parameters (Lemma 2. 5). 2 The 
theorem now follows from standard composition properties of differential privacy. □ 



3.1.3 Query Privacy 

In this section we prove the following theorem 

Theorem 3.7. Algorithm 1 satisfies (e, 5) -one-query-to-many-analyst differential privacy. 

Before proving query privacy of Algorithm 1, we will state a useful composition lemma. The lemma is a 
generalization of the "secrecy of the sample lemma" [KLN + 11, DRV 10] to the interactive setting. Consider 
the following game: 

• Fix an (e, 6) -differentially private mechanism A : U* — > TZ and a bit 6 £ {0, 1}. Let Dq = 0. 

• Fort = 1,...T 

- The (possibly randomized) adversary B(yi, . . . , y%\ r) chooses two distributions B®, B>} such 
that SD(B%,B\) < a. 

- Choose x t ^— R P>\ and let D t = D t -i U {x t }. 

- Choose y t <- R A(D t ). 

For a fixed mechanism A and adversary B, let V° be the distribution on {y\, . . . , yx) when 6 = and V 1 
be the distribution on (yi, . . . , yx) when 6 = 1. 

Lemma 3.8. Ife < XjlandTo < 1/12, then with probability at least 1—TS— 5' over y = (yi, . . . ,yx) <— R 
V° 



< e(Ta)^2Tlog{l/5') + 30e 2 (ra)T. 



We could improve the constants in our privacy analysis slightly by finding the queries with large error using sparse vector and 
answering them using the Laplace mechanism in one step. However, in our algorithm for achieving analyst-to-many privacy, we 
need to do the analogous steps separately, and thus we chose to present them this way to maintain moduarity. 
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We prove this lemma in Appendix A. 

We also need another lemma about the Bregman projection onto the set of high-density measures (Defi- 
nition 2.10) 

Lemma 3.9 (Projection "hides" one action). Let Aq : A — > [0, 1] and A\ : A U {a*} —■ [0, 1] be two 
full-support measures over their respective sets of actions and s £ (0, |4) be such that 1) \Aq\, \A±\ < s 
and!) A (a) = Ai (a) for every a £ A Let A' Q = T S A and A[ = T s Ai. Then SD(A' ,A' 1 ) < 1/s. 

Proof of Lemma 3.9. Using the form of the projection (Definition 2.10), it is not hard to see that for a ^ a*, 
4(°) > 4(°0- For convenience, we will write A' (a*) = even though a* is technically outside of the 
domain of A' . We can now show 

14(a) - A[(a)\ = \A' (a*) - A[(a*)\ + £ |A' (a) - A\(a)\ 

aeAU{a*} a^a* 

<l+Yl \A' (a) - A[(a)\ 
= 1+ Y, A' (a)-A[(a) 



(A' (a) > A'^a) for a / a*) 



1 + \A' \ - (|4| - Ai(a*)) < 1 + 141 " (141 " 1) 
l + s-(s-l) = 2 



We also have that I A' n 



141 = s > s ° 



SD(A' ,A' 1 ) = \ £ 



a£AU{a*} 



1 

2~s 



4(a) 4(a) 



a&AU{a*} 



141 

14(a) 



A' 



4 (a) | < 



□ 



Now we can prove one-query-to-many-analyst privacy. 

Proof of Theorem 3.7. Fix a database D. Consider two adjacent query sets Qq ~ Q\ and, without loss 
of generality assume Qq = Q\ U {q*} and that q* G Q-^ for some analyst id. We write the output to 
all analysts as v = {x\, . . . , xt, b%, ■ ■ ■ , bigi, ai, . . . , a|g|) where D = x%, . . . ,xt is the database that 
is released to all analysts, b%, . . . , b\g\ is a sequence of bits that indicates whether or not qj(D) is close to 
qj(D), and a%, . ■ ■ , a|g| is a sequence of approximate answers to the queries qj(D) (or _L, if qj(D) is already 
accurate). We also write t;„ i( j to indicate the portion of v that excludes outputs directly pertaining to analyst 
id's queries. Let Vq, V\ be the distribution on outputs when the query set is Qq and Qi, respectively. 

We will analyze each of the three parts of v separately. First we show that D, which is shared among all 
analysts, satisfies analyst privacy. 

Claim 3.10. With probability at least 1 — 5 over x±, . . . ,xt Vq, 



In ( Y^ll • • • ' jjjjj 
\Vi(xi, . . . ,x T ) 



< e 
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Proof of Claim 4.5. To prove the claim, we show how the output x%, . . . , xt can be viewed as the output of 
an instantiation of the mechanism analyzed by Lemma 3.8. Notice that for every t G [T] and q\, . . . , qt-i, 
we can write the measure D t over database items as 



t-i 



A(x) = exp ( -(77/2)^1 + 

3=1 



X 



If we replace a single query % with ^, and obtain the measure D' t , then for every x G X, 



In 



Dt(x) 
M x ). 



< 



i] 



Thus we can view x t as the output of an //-differentially private mechanism Ad(qi, ■ ■ ■ , Qt-i), which fits 
into the framework of Lemma 3.8. (Here, x t plays the role of y t and gi, . . . , qt~\ plays the role of D t -\ in 
the description of the game, while the input database D is part of the description of A). 

Now, in order to apply Lemma 3.8, we need to argue the distribution on samples q t when the query set 
is Qo is statistically close to the distribution on samples qt when the query set is Q\. Fix any t G [T] and let 
Qo,Qi be the measure Q t over queries maintained by the query boosting algorithm when the input query 
set is Qo, Qi, respectively. For q ^ g*, we have 

t-i 



Qo(q) = Qi{q) = exp -fa/2) ]T 1 + q(D) - q(xj) 



Additionally, we have Qo(q*) = (for notational convenience), while Qi{q*) G (0, 1]. Thus, if we let 
= r s Qo and Pi = T s Qi, we will have SD(Po, Pi) < l/s by Lemma 3.9. Since the statistical distance 
is 1/s = 1/12T, we can apply Lemma 3.8 to show that with probability at least 1 — 5, 



In 



,XT) 



V'{x u 



< r,y/T\og{l/5) | 5r? 2 r < g 



(ry = e/(2 v /Tlog(l/«5))) 
□ 



Now that we have shown D satisfies (e, 8) -one-query-to-many-analyst differential privacy, we can show 
that the remainder of the output satisfies perfect one-query-to-many-analyst privacy. Recall from the proof 
of Theorem 3.1 that D will be accurate for all but s queries. That is, if we let {/i}j 6 ng|| consist of the 

functions fj(D) = \qj(D) — qj(D)\, then 

\{j\fj(D)>a}\<s, 

where a is chosen as in Theorem 3.1. By Lemma 2.7, the sparse vector algorithm will release bits b\, . . . , fcigi 
(the indicator vector of the subset of queries with large error) such that for every j G [| Q\], the distribution 
on bj does not depend on any function fy for j' / j. Thus, if z_ a contains all the bits of bi, . . . , &igi that 
do not correspond to queries in Q a , then the distribution of does not depend on the queries asked by 
analyst id, and thus z_;d is perfectly one-query-to-many analyst private. Finally, for each query qj such that 
bj = 1, the output to the owner of that query will include aj = qj(D) + Zj where zj is an independent 
sample from an appropriately chosen noise distribution. These outputs do not depend on any other query, 
and thus are perfectly one-query-to-many analyst private. 
This completes the proof of the theorem. 

□ 
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4 One-Analyst-to-Many-Analyst Privacy 



4.1 An Offline Mechanism for Counting Queries 

In this section we present an algorithm for answering linear queries that satisfies the stronger notion of 
one-analyst-to-many-analyst privacy. The algorithm is quite similar to Algorithm 3, but with two notable 
modifications: 

1. In Algorithm 3, we had a "query player" who plays queries as actions, and attempts to find a query 
with large error. That is, the query player attempted to find q £ Q to maximize q(D) — q(D). In the 
new algorithm, we will have an "analyst player" who chooses analysts as actions and is trying to find 
an analyst id G [m] (recall that the queries are given to the mechanism in sets Qi , . . . , Q m ) for which 
there is at least one query in Qid with large error. That is, the analyst player attempts to find id G [m] 
to maximize max ge g. d q(D ) — q(D). 

2. In Algorithm 3, we were able to compute a database D such that \q{D) — q(D)\ was small for all but 
s queries from Q, for s « n. In order to answer every query accurately, we were able to find each 
of the s queries using sparse vector, and answer each of the s queries using the Laplace mechanism. 
In the new algorithm, we will compute a database D such that max ge Q ;d \q(D) — q(D)\ is small for 
all but s analysts in the set [m]. We can still use sparse vector to find these s analysts, however each 
of the analysts may ask an exponential number of queries, in which case we cannot use the Laplace 
mechanism. However, since there are not too many analysts remaining, we can use s independent 
copies of the multiplicative weights mechanism (each run with e' « e/^/s) to answer their queries. 

We can now state the algorithm. 

4.1.1 Accuracy Analysis 

Theorem 4.1. Algorithm 4 is (a, f3) -accurate for 

\/log(\X\ + m) log |Qid| log(m//9) log 3 / 4 (1/5) \ 



a = 



en 



1/3 



Proof. As we discussed above, the algorithm is computing an approximate equilibrium of the game 

n ( 1 + q{D) - q(x) 

(jdq{%, id J = max max . 

ideMgeQid 2 

Let v, v s be the value and constrained value of this game, respectively. First we pin down the quantities v 
and v s . 

Claim 4.2. For every D,m, Q, the value and constrained value ofGr},m,Q is 1/2. 

The proof of this claim is omitted, but is nearly identical to that of Claim 3.2. 
Let D = i J2t=i %t- By Corollary 2.13, 



v s — 2/j < max E 
JeA s (M)id«- R j 



max 



l+q(D)-q(D) 



<?ee id \ 2 



< v + 2p. 
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Algorithm 4 Offline Mechanism for Counting Queries w/ One-Analyst-to-Many-Analyst Privacy 

Input: Database D G X n , and m sets of linear queries Qi, . . . , Q m . For id G [m], let Q i( i = QidU-iQid- 

Initialize: Dq = for every x G Af, io(<?) = V m for every id G [m]. 

Let T = n 2 / 3 • max{log m}, f] = e- l ^/Tlog{l/S)/2, s = 12T. 
DataPlayer: 

On input an analyst id t 

For every x G X, let D t (x) = D t -i(x) • exp (-»/max, ee ^ ^L+MMzlM^ 

Choose Xt ^— R .Dt and send x t to QueryPlayer 

AnalystPlayer: 

On input a data element xt 

For every id G X, let / m (id) = J t (id) • exp (- v max qeQid ( 1+g(D ^~ g(g<) )) 
Let P t +i = r s / t +i 
Choose idt+i <— R Pt+i and send idt+i to DataPlayer 

GenerateSynopsis : 

Let -D = (a?i, . . . , xt) 

Run sparse vector on D to get a sequence of at most s analysts lji na \ = {idi, . . . , id s } C [to] 
For each analyst id G I final, run .Amw(-D) Qid) — using privacy parameters 

ef = e/(10Vslog(3s/<J)) and <$' = <5/3s— to obtain a sequence of answers a-^. 
Output D to everyone and a;d to analyst id for each id G [m] \ i/j no j 
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Applying Claim 4.2 and rearranging terms, we have that with probability 1 — (3/3 

q(D) - q(D) 



max 


E 


JeA s (H) 





max q(D) — q{D) 

geQid 



max E 
/eA s ([m]) \id<- R / 



max 
?eQ id 



< 4p 



max{log|A'|,logm} 41og(3//3) 



T 



( y/log(\X\ + m) log(l/J) + log(l//3) \ 
° ^ := ° 5 



The previous statement suffices to show that max qe Q id \q(D) — q{D)\ < for all but s analysts id G [m]. 
Otherwise, the uniform distribution over the analysts for which the error bound of does not hold would 
be a distribution over analysts, contained in A s ([m]) with expected error larger than ag. 

Since there are at most s such analysts we can run the sparse vector algorithm (Lemma 2.7), and, with 
probability at least 1 — (3/3, it will identify every analyst id such that the maximum error over all queries in 
Qid is larger than ag + asv for 



asv = O 



\/slog(l/5)log(m/(3) 



en 



There are at most s such analysts. Thus, running the multiplicative weights mechanism (Lemma 2.6) in- 



dependently for each of these analysts' queries — with privacy parameters e' = 0(e/y slog(s/#)) and 
5' = 0(5 /s) — will yield answers such that, with probability 1 — (3/3, for every id G I' 



max \q(D) — a q \ < O 



< O 



> a/4 log 1/4 \X\ y%g N_Qidl//3)log 3/4 (s/^ 
en 



' n 1 / 6 ^/logd^l+mJlogdQidl/^) log 3 / 4 (l/5) ' 



en 



^\og(\X\ +m)log(|Q id |//3)log 3 / 4 (l/^ ^ 
< O I - , r„ ^ := «mw 



n' 



Taking a union bound, observing that the maximum error on any query is max{a^ + asv> «mw}> and 
simpifiying, we get that the mechanism is (a, /3)-accurate for 



a = 



Vlog(\X\ + m) log |Qid| log(m//3) log 3 / 4 (l/5) ' 



en 



1/3 



□ 



4.1.2 Data Privacy 

Theorem 4.3. Algorithm 4 satisfies (e, 5)-differential privacy for the data. 

We omit the proof of this theorem. The proof follows that of Theorem 3.3 almost identically. The only 
difference is that in the final step, we need to argue that running s independent copies of multiplicative 
weights with privacy parameters e' = Q(e/y/ s log(s/£)) and 5' = 0(5/s) satisfies (e/3, 5/3) -differential 
privacy, which follows directly from the composition properties of differential privacy Lemma 2.8. 
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4.1.3 Query Privacy 

In this section we prove the following theorem 

Theorem 4.4. Algorithm 4 satisfies (e, 5) -one-analyst-to-many-analyst differential privacy. 

Proof. Fix a database D. Consider two adjacent sets of queries Qq, Qx. Without loss of generality assume 
Qo = Qidi U . . . Qid m and Qx = Qq U Qid* - That is Qx is just Qq with an additional set of queries 
Q id * added. We write the output to all analysts as v = (xi, ■ ■ ■ , x~t, i>i, • • • , b m , ax, . . . , a m ) where D = 
xx , ■ ■ ■ , xt is the database that is released to all analysts, bx , • • • , b m is a sequence of bits that indicates 
whether or not qj(D) is close to qj(D) for every q G Qy, and ax, • • • , a m is a sequence consisting of the 
output of the multiplicative weights mechanism for every analyst id £ [m] and _L for every other analyst. 
Let Vo, Vx be the distribution on outputs when the queries are Qq and Qx, respectively. 

The proof closely follows the proof of one-query-to-many-analyst privacy for Algorithm 3. Showing 
that the final two parts of the output are query private is essentially the same, so we will focus on proving 
that D satisfies one-analyst-to-many-analyst privacy. 

Claim 4.5. With probability at least 1 — 5 over xx , ■ ■ ■ , x~t ^~ r Vo, 



( V (xx,...,x T ) 
\Vx(xx, . ,x T ) 



< e 



Proof of Claim 4.5. To prove the claim, we show how the output xx, ■ ■ ■ , x~t can be viewed as the output of 
an instantiation of the mechanism analyzed by Lemma 3.8. Notice that for every t £ [T] and idi, . . . , idt_i, 
we can write the measure D t over database items as 



t-x 



D t (x) = exp —(77/2) V max 1 + qj(D) - q)(. 

j=X lcl j 



■r 



If we replace a single analyst id^ with id £ , and obtain the measure D t , then for every x G X, 



In 



Dt{x) 
AW. 



< 



Thus we can view x t as the output of an ^-differentially private mechanism ^^(idx, . . . , idt_i), which fits 
into the framework of Lemma 3.8. (Here, xt plays the role of yt and idi, . . . , idt_i plays the role of Dt-x 
in the description of the game, while the input database D is part of the description of A). 

As before, we apply Lemma 3.8, to argue that the distribution on analysts id t when the the query set is 
Qo is statistically close to the distribution on analysts idt when the analyst set is Qx. The argument does not 
change significantly, thus we can apply Lemma 3.8 to show that with probability at least 1 — 5, 



In 



V(xx, ...,xt) 
V'(xx,...,x t ) 



< V ^Tlog(l/5) | WT <g 



(t? = e/(2VTlog(l/5))) 



□ 



As before, the remainder of the output satisfies perfect one-analyst-to-many-analyst privacy. This com- 
pletes the proof of the theorem. 

□ 
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5 Online One-Query-to-Many- Analyst Privacy 



In this section, we present a mechanism that provide one-query-to-many-analyst privacy in an online setting. 
The mechanism can give accurate answers to any fixed sequence of queries that are given to the mechanism 
one at a time. Typically, mechanisms for differentially private query release in the interactive setting give 
accuracy guarantees for adaptively chosen queries. 

The mechanism is similar to the online multiplicative weights algorithm of Hardt and Rothblum [HR10]. 
In their algorithm, a hypothesis about the true database is maintained throughout the sequence of queries. 
When a query arrives, it is classified according to whether or not the current hypothesis accurately answers 
that query. If it does, then the query is answered according to the hypothesis. Otherwise, the query is 
answered with a noisy answer computed from the true database and the hypothesis is updated using the 
multiplicative weights update rule. 

The main challenge in making that algorithm query private is to argue that the hypothesis does not 
depend too much on the previous queries. We overcome this difficulty by "sampling from the hypothesis." 
Recall that a database can be though of as a distribution over the data universe, and thus it makes sense to 
sample from the hypothesis database. The two constraints we must balance are 1) the need to take many 
samples, so that the database we obtain by sampling accurately reflects the hypothesis database and 2) the 
need to limit the impact of any one query on the sampled database. In order to balance these two constraints, 
we introduce batching. That is, instead of updating every time we find a query not well-answered by the 
hypothesis, we batch together s queries at a time, and do one update on the "average" of those queries, 
which will limit the influence of any one query. 

A note on terminology: we will break the execution of the algorithm into T epochs. The £-th epoch 
is the set of rounds for which the current hypothesis is H t . Queries i that are answered using the real 
database are called bad rounds. Rounds that are not bad are good rounds. These are the rounds in which 
\qi{D) - qi{H t ) + Zi\ > a. 

We will use the following tail bound on sums of Laplace variables. 

Lemma 5.1 ([GRU12]). Let X±, . . . , Xt be T independent draws from Lap(2/e), and let X = Ylt=i Xt- 
Then, we have 

Pr 



X\ > 5«rVTlog(2//3) < p 



5.1 Accuracy 

In this section, we will prove that our mechanism is accurate for all queries in the stream. Intuitively, there 
are three ways that our algorithm might give an inaccurate answer, and we treat each separately. First is that, 
in a good round, the answer given by the hypothesis may be a bad approximation to the true answer. Second, 
in a bad round, the answer given may have too much noise. We address these two cases with straightforward 
arguments showing that the noise is not too large in any round. 

The third way the algorithm may be inaccurate is if there are more than R bad rounds, and the algorithm 
terminates early. We show that this is not the case using a potential argument showing that after sufficiently 
many bad rounds, the hypothesis Dt and the sample Ht will be accurate for all queries in the stream, and 
thus there will not be any more bad rounds. The potential argument is a simple extension of the argument 
in Hardt and Rothblum [HR10] that handles the additional error coming from taking samples from D t to 
obtain H t . 
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Algorithm 5 Analyst-Private Multiplicative Weights for Counting Queries 
Input: Database D e X n , sequence (ft, ...,(& of counting queries 

Initialize: D (x) = 1/\X\ for every x £ X, H = D , Uq = 0, so = s + Lap(2/e), t = 0, r = 

1 128n 2 / 5 v /log |Af| log(4fc//3) log(l/<5) 



n = 32n 4/5 log(4fc//3) T = n 4/5 log |#| 
_ 20000 log 3 / 4 \X\ log 1 / 4 (4fc/ ) g) log 5 / 4 {4/5) 

£-3/2^2/5 

_ 80000 log 3 / 4 |^| log 5 / 4 (4fc/ff) log 5 / 4 (4/(5) 

£-3/2^2/5 

« = 2sT 



AnswerQueries: 

While t <T,r < R,i < k 
On input query qf. 
Let Zi = Lap(cr) 
If \ qi {D) - qi {H t ) + Zi \ <t 

Output ft (H t ) 
Else if - qi{H t ) + Zi\ > r 

u = sgn(qi(H t ) - q^D) - z { ) ■ q { , U t =U t U {u} 
Output qi(D) + Zi 
Let r = r + 1 
If \U t \ >s t 

Let (D t +i,flt+i) = Update(A,W t ) 
LetU t +i = 0, s t+ i = s + Lap(2/e) 
Let t = t + 1 
Advance to query ft + i 



Update: 

Input: distribution D t , update queries lit = ■ • • , u St } 
Let Ut(x) = ^ YljLi Uj{x) for every x e X 

Let D t+ i(x) = exp(— (a f /2)ut(x))D t (x) for every x £ X and renormalize 
Let H t +i be n independent samples from D t +\ 
Return: (D t+1 ,H t+ i) 
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Theorem 5.2. Algorithm 5 is (a, f3) -accurate for 



( log 3 / 2 (fc//j)Vlog|^|log(l/^) \ 

a \ £ 3/2 n 2/5 J 

We note that we can achieve a slightly better dependence on k, \X\, ^, y, jjj by setting the parameters a 
bit more carefully. See Section 5.4 for an intuitive picture of how to set the parameters optimally. We have 
also made no attempt to optimize the constant factors in the algorithm. 

Proof. First we show that, as long as the algorithm has not terminated early, it answers every query accu- 
rately. 

Claim 5.3. Before the algorithm terminates, with probability 1 — (3/4, every query is answered with error 
at most a + 6cr log(3/c//3) 

Proof of Claim 5.3. Condition on the event that < 6cr log(4fc//3) for every i = 1, 2, . . . , k. A standard 
analysis of the tails of the Laplace distribution shows that this event occurs with probability at least I — (3/3. 

First we consider bad rounds. In these rounds q. L is answered with qi{D) + z%. Since we have assumed 
\zi\ is not too large, all of these queries are answered accurately. 

Now we consider good rounds. In these rounds we answer with qi(H t ), and we will only have a good 
round if \qi{D) — qi(H t ) + Zi\ < r. Since we have assumed a bound on \zi\, we can only have a good round 
if\ qi (D)-q t {H t )\<T + 6alog(3k/(3). □ 

Now we must show that the algorithm does not terminate early. Recall that it can terminate early either 
because it hits a limit on the number of epochs, or because it hits a limit on the number of bad rounds. We 
will use a potential argument to show that there cannot be too many epochs. The number of bad rounds that 
is in epoch t is a random variable st, and we will also show that with high probability, there are not too many 
bad rounds within the T epochs. 

Claim 5.4. With probability 1 — 3/3/4, the algorithm does not terminate before answering k queries. 

Proof of Claim 5.4. We will use a potential argument a la Hardt and Rothblum [HR10] on the sequence of 
databases Dt. The potential function will be 

$t = RE{D\\D t ) :=Y,D{x)\og(D{x)/D t (x)). 

Elementary properties of the relative entropy function show that <E> t > and <E>o = RE{D\\Dq) < log \X\. 
A lemma of Hardt and Rothblum expresses the potential decrease from the multiplicative weights update 
rule in terms of the error of the current hypothesis on the update query. 

Lemma 5.5 ([HR10]). $ t _i -$ t >V ( u t(D) - u t (A-i)) - V 2 /^ 

Since the potential function is bounded between and log \ X\, we can get a bound on the number of 
epochs by showing that the potential decreases significantly between most epochs. Given the preceding 
Lemma, we simply need to show that the queries ui , 112, ■ ■ . have large (positive) error. 

Recall that u t = j Ylu£U t u - ^ so recai l tnat if u € W and u = then the reason % is in U is because 
qi(D) — qi(Ht-i) + Z{ > r. Similarly, if u = ^qi, then qi(D) — qi(H t -i) + Z{ < —r. We will focus on the 



24 



first case where qi{D) — ^(A-i) + Z{ > r, the other case will follow similarly. We can get a lower bound 
on u(D) — u(A-i) as follows. 

u(D) - u(A-i) > u{D) - u(H t -i) + z t - \ Zi \ - |<ft(A-i) - %(A-i)| 

> T - |Zi| - |ft(flt_l) - %(A-l)| 

We need to show that the right-hand side of the final expression is large. We have already conditioned on 
the event that \z%\ < 6cr log(3/c//3) < r/4. Recall that A-i is a collection of n samples from Dt-\. Thus 
a simple Chernoff bound (over the n samples) and a union bound (over the k queries) shows that, with 
probability 1 - /3/4, for every i G [k], |%(A-l) - ft(A-i)l < Vl61og(3T//3)/n < r/4. 
Thus, with probability at least 1 — 2/3/3, for every t and every u G hit, 

u(D) - u(A-l) > r - r/4 - r/4 = r/2 

Now we have 

u t (£>) - u t (A-i) = - V < D ) ~ < D ^i) > ^P- 
s t— 1 Is 

ueu t 

Conditioning on the event that all of the noise values Zi are small and all of the sampled hypotheses H t are 
accurate for A on every query. 

■■■ i f i y\u t \ i -rv 




Tr ? =^T- + ^ 2^ 5 * " Tr? 



2 2s 



where St is the value of the sample Lap(2/e) used to compute st in the t-th epoch. Thus, applying 
Lemma 5.1 to 5 = Ylit<T' we nave that with probability 1 — 0/4, 

*t'-*o>^-^|s|-tV 

2 2s 

>^-^5 £ -Wlog(20//3)-TV 
2 2s 

Now, noting that r/2 > 877 we can simplify to obtain 

$T' - $0 > 2r/ 2 rr' - 7] 2 T' > rj 2 T' 

Thus, conditioning on all the events above, T' < log \ X\/rj 2 < n 4 / 5 log \ X\. These events all occur together 
with probability at least 1 — 3/3/4, and thus the algorithm does not terminate because it hits the limit of 
T epochs. Lastly, we need to show that the algorithm does not hit the limit of R bad rounds within those 
at-most T epochs. Notice that the number of bad rounds is at most Ylt=i s t = Ylt=i s + &t where St is the 
sample of Lap(2/e) used to compute St. Applying Lemma 5.1 again we have 

T T 

J2s + S t <Ts + J2S t <Ts + 5s- 1 VTlog(2//3) <2Ts = R 
t=i t=i 

Thus the algorithm does not terminate due to having more than R bad rounds. Since the algorithm does not 
hit its limit of T epochs or R bad rounds, except with probability at most 3/3/4, the claim is proven. □ 

Combining the previous two claims proves the theorem. □ 
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5.2 Data Privacy 



In this section we establish that our mechanism satisfies differential privacy. Our proof will rely on a modular 
analysis of interactive differentially private algorithms from Gupta, Roth, and Ullman [GRU12]. Although 
we have not presented our algorithm in their framework, the algorithm can easily be seen to fit, and thus we 
will state an adapted version of their theorem without proof. 

Theorem 5.6 ([GRU12], Adapted). If Algorithm 5 experiences at most R bad rounds, and the parameters 
are set so that a > 1000v/ ^^°s( 4 /^) ) i nen Algorithm 5 is (e, 5) -differentially private. 

Theorem 5.7. Algorithm 5 satisfies (e, 5) -differential privacy. 

Proof. The theorem follows directly from Theorem 5.6 and our choice of R. □ 
5.3 Query Privacy 

In this section we prove that this mechanism satisfies one-query-to-many-analyst privacy. 
Theorem 5.8. Algorithm 5 is (e, 5)-one-query-to-many-analyst private. 

Proof. Fix the input database D. We will also fix the coins of the Laplacian noise, as this source of ran- 
domness will not be used in our proof of analyst privacy. That is, we will show that for every value of 
the Laplacian random variables, the mechanism satisfies analyst privacy. Thus the entire mechanism is a 
convex combination of analyst private mechanisms and is itself analyst private. Consider any two adja- 
cent sequences of queries Qo,Qi- Without loss of generality, we will assume that Q = qx,---,qk and 
Q! = q*, qi, . . . , qf.. That is, Q! is just Q with an additional query q* inserted at the beginning of the 
sequence. For notational simplicity, we assume that every query in Q has a fixed index, regardless of the 
presence of q*. More generally, we could identify each query in the sequence by a unique index (say, a 
long random string) that is independent of the other queries in the sequence. We can also assume that the 
sequence of queries Q are public, and we are only trying to argue that the presence or absence of q* is 
hidden. 

Recall that what we want to argue is that the answers to all queries in Q are private, but not that the 
answer to q* is private (if it is requested). To simplify the argument, we will represent the answers to the 
queries in Q in a "reduced form." The reduced form is a sequence {(Ht, it)}te[T] wnere Ht is the hypothesis 
used in the i-th epoch and i t is the index of the last query in that epoch (the one that caused the mechanism 
to switch to hypothesis H t . Observe that for a fixed database D, fixed Laplacian noise, and a fixed sequence 
of queries Q, it is possible to simulate the output of the mechanism for all queries in Q given only this 
information. To see that this is indeed the case, notice that if we fix a hypothesis H t , the database D, and 
the coins of the Laplacian noise, then we can determine for any query q whether or not q will lead to a good 
round or a bad round (recall that a bad round is one in which \q(D) — q(H) + z\ > a). So once we begin 
epoch t with hypothesis H t , we can determine all the bad rounds, and once we are given i t we can determine 
when epoch t ends and epoch t + 1 begins. Once this occurs, we already have the next hypothesis Ht+i and 
can continue simulating. 

Let Vq, V% be distribution over sequences {(Ht, it)} when the query sequence is Qq, Qi, respectively. 
We will show that with probability at least 1 — 5, if {(Ht, it)}te[T\ ^ s drawn from Vq, then 
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Recall that lit is the set of queries that are used to update the distribution D t and yield Dt+\. We will 
use U<t = Uj=o to denote the set of all queries used to update the distributions Dq, . . . , Dt. Notice that 
if q* does not get added to the set Uq, then V$ and V\ will be distributed identically, and thus there is no 
privacy loss. Therefore we will assume that q* G Uq. First we will reason about the joint distribution of the 
first component of the output. 



Claim 5.9. For every Hq, iq, 

r Vo(H ,io) 



In 



Vx(H ,io) 



< e/2 



Proof of Claim 5.9. Since Ho does not depend on the query sequence, it will be identically distributed 
in both cases. Once Hq is fixed, we can determine for the entire query sequence qi,...,qp, whether or 
not that query will cause an update. Fix query qi and assume that it is the s-th update query in the 
sequence qi, . . . , qj. and the (s + l)-st update query in the sequence q*, q\, . . . , q^. Then Vo(io\Ho) = 
Pr [sq = s] and V\{io\Ho) = Pr [sq = s + 1]. By the basic properties of the Laplace distribution we have 
|ln(F (»o|Ho)M(to|flo))|<e/100. ' □ 

Now we reason about the remaining components {H\,i\), . . . , (Ht, it)- 

Claim 5.10. For every Hq,iq, with probability at least 1—8 over the choice of components v = (H\ : i\, . . . ,Ht,it) 
(Vb | v t -i), 

'V (v I H ,i ) 



In 



Vi(v | H ,i ) 



< e/2 



Proof of Claim 5.10. We will show that v is the nT-fold composition of (eo, 0) -differentially private mech- 
anisms for suitable eo- Fix a prefix Vt-i = Ho,io,..., H t -i,it-i. Given this prefix, we can determine for 
any given sequence of queries qi, . . . , qi t _ 1 or q*, qi, . . . , qi t _ 1 which queries are in the update set. More- 
over, if is the set of all update queries from the first query sequence, and U' <t is the set of all update 
queries from the second sequence, then ZY<tAW< t = q*. 

Now consider the distribution of H t . Each sample in H t comes from the distribution D t , which is either 



D t (x) oc exp —(rj/s) u I or D' t (x) 



oc cxp 



ueu< 



-(v/s)- s u ) 



Given this, it is easy to see that for any x we have | ln(D t (x) / D' t (x))\ < 2-q/s := eo- Notice that once 
it-i and H t are fixed, then i t depends only on the choice of s t (the number of bad rounds to allow before 
updating the hypothesis), which is independent of the query sequence and thus incurs no additional privacy 
loss. Thus the only privacy loss comes from the n samples in each of the T epochs, and is thus a nT-fold 
adaptive composition of (eo,0) differentially private mechanisms. A standard composition analysis shows 
that the components v are (e', <5)-DP for e' = e ^2nT\og(l/5) + 2e\T < e/2. This completes the proof 
of the claim. □ 

Combining these two claims proves the theorem. □ 
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5.4 Handling Arbitrary Low-Sensitivity Queries 



We can also modify this mechanism to answer arbitrary A-sensitive queries, albeit with worse accuracy 
bounds. We modify the algorithm to run the multiplicative weights updates over the set of databases X n 
and adjust the parameters. When we run multiplicative weights over a support of size \X\ l , the potential 
function will be upper bounded by n log \X\, rather than log \X\, which will increase the number of epochs 
we need for convergence. 

We will now sketch the argument. We will ignore the parameters j3 and 5 for simplicity. In order to get 
convergence of the multiplicative weights distribution, we need to take T nlo ^ X ^ and in order to ensure 

that H t approximates D t sufficiently well, we take n ~ \J^^- Recall that to argue analyst privacy, we 
viewed the mechanism as being (essentially) the nT-fold composition of eo- ana ly st private mechanisms, 
where eq = rj/s. In order to get analyst privacy, we needed 

r] e erf 



s ~ VnT V n log|Af|log 1/4 /c 
^ Vnlogl^llog 1 / 4 *; 

m 

Once we have set s (as a function of the other parameters) to achieve analyst privacy, we can work on 
establishing data privacy. As before, the number of bad rounds will be 

z 3 / 2 log 3 / 2 W log 1 / 4 & 



R^sT " 



er/ 3 



Given this bound on the number of bad rounds, we need to set 

Ay/R An 3 / 4 log 3 / 4 \X\ log 1 / 8 k 



a 



to obtain data privacy and 



£r j3/2 



An 3 / 4 log 3 / 4 |Af|log 9 / 8 A: 
° log k « irj- 



to ensure that all the update queries truly have large error on the current hypothesis H t . 

The final error bound will come from observing that 77 and r are both lower bounds on the error. The 
error is bounded below by r because that is the noise threshold set by the algorithm. And r must be larger 
than 77 or else we cannot argue that multiplicative weights makes progress during update rounds. Thus 
setting 7] = t will approximately minimize the error. The final error bound we obtain is 



O 



' A 2/5 n 3/io iog 3 A°|;r|log 9 / 2( V 



£ 2/5 

which gives a non-trivial error guarantee when A <C 1/n 3 / 4 . 



6 Conclusions 

In this paper, we have shown that it is possible to privately answer many queries while also preserving the 
privacy of the data analysts — even if multiple analysts may collude, or if a single analyst may register mul- 
tiple accounts with the data administrator. By doing so, we have put analyst privacy on the same sound 
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worst-case footing that differential privacy has always stood on. We have shown that analyst privacy is 
nearly costless — at least, if we restrict ourselves to one-query-to-many-analyst privacy for linear queries in 
the non-interactive setting. In this case, we are able to recover the nearly optimal 0(l/y/n) error bound 
achievable even without promising analyst privacy. However, it remains open whether this bound is achiev- 
able for one-analyst-to-many-analyst privacy, or for non-linear queries, or in the interactive query release 
setting. Conceptually, the most interesting open question here is: can one-analyst-to-many -analyst privacy 
be achieved without any asymptotic cost to query accuracy ? 

We have also introduced a novel framework for posing the private query release problem and its solution: 
namely, viewing it as an equilibrium computation problem in a two-player zero-sum game. This allows us 
to easily move between privacy guarantees by modifying the strategies of the different players, and the 
neighboring relationship on game matrices (i.e. differing in a single row for analyst privacy, vs. differing 
by 1/n in norm for data privacy). We expect that this will be a useful point of view to take for other 
problems in data privacy as well. It is also known how to privately compute equilibria in certain types 
of multi -player games [KPRU12]. Is there a useful way to think of this multi-player generalization when 
solving problems in private data release? 
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A Proof of Lemma 3.8 

First we restate the lemma. Consider the following process: 

• Fix an (e, 5) -differentially private mechanism A : U* — > TZ and a bit 6 G {0, 1}. Let Dq = 0. 

• Fort = 1,...T 

- The (possibly randomized) adversary B{y\, . . . ,y t ;r) chooses two distributions B®, B\ such 
thatST*^ ,^ 1 ) < a. 

- Choose xt ^ — r B\ and let D t = D t -\ U {x t }. 

- Choose y t A(D t ). 

For a fixed mechanism A and adversary B, let V° be the distribution on (yi, . . . , yx) when 6 = and V 1 
be the distribution on (j/i, . . . , yr) when 6 = 1. 
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Ill 



< e(To-) y^Wk^il/^) + 30e 2 (Ta)T. 



Lemma A. 1. Ife < 1/2 and Ta < 1/12, then with probability at least 1— T '5— 5' over y = (yi, . . . , yr) ^— R 

Proo/ Given distributions 5°, B 1 such that SD(B°, B 1 ) < a, there exist distributions C°, C 1 , C such that 
5° = aC° + (1 — cr)C and -B 1 = aC l + (1 — a)C. An alternative way to sample from the distribution B b 
is to flip a coin c G {0, 1} with bias a, and if the coin comes up 1, sample from C b , otherwise sample from 
C. 

Consider a partial transcript (r, yi, . . . , yt-i)- Fixing the randomness of the adversary will fix the coins 
ci, . . . , or, which determine whether or not the adversary samples from C b or Cj for j G [T]. Let w = 

YlJ=i c j- Fixing the randomness of the adversary and yi, . . . , yt-\ will also fix the distributions Cj for 
j < t and, in rounds for which Cj = 0, will fix the samples xj for j < t. If we let D® , L>£ denote the 
database Dt in the case where b = 0, 1, respectively, then we have 

t T 

\D$-Dl\<Y,Ci<Y, c i = w - 

Thus, we have 



In 



\Vt(yt\r, yi,.. • ,y t -i) 



< we 



and 



E 



In 



Vt°(z/tk, yi, ■ ■ LL jfc-i) 
^/(ytlr, yi,. . . ,yt-i) 



< we min {e we — 1,1} 



where the expectation is taken over V^°|r, yi, . . . , yt-i- 

Fix u; G {0, . . . , T}. Conditioning on any r such that Ylt=i c t = w > we can a PPly Azuma's inequality 
as in [DRV 10] to obtain 



D™ +5 ' '(V°M|VV) < we^2T log(l/£') + we min {e W£ - 1,1} T 



Thus 



D^'CV^HV 1 ) < 5^Pr[w] (u-ev^T log(l/<5') + we min {e^ - 1,1} t) 



10 = 1 

r 



^ Pr [w] we v / 2Tlog(l/(5 / ) + ^ Pr [to] we min {e we - 1, 1} T (1) 



w=l 



w=l 



First we consider the left sum in (1). 

T 

Pr M wey/2Tlog(l/6') 



W = l 



T 

eV2Tlog(l/5>)^r)o- w (l-o-: 

w=l 



T-w. 



W 



T- 1 



ey/2T\og{l/5'){Ta) £ 



io=0 



T- 1 



a-(l-a) 



T-l-to 



eV2Tlog(l/<y')(T<7) 



to 



/T-l 
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Now we work on the right sum in (1) 

T 



Pr [w] (we min {e W£ — 1, 1} T) 

W = l 

Y ( T \ w (l-a) T - w (wemm{e W£ -1,1}T) 

w=l ^ 7 

(^T)Y:( T V(l-af-- W + (eT) £ Q^l-a 

to=l V 7 u)=l/e V 7 

w=l v 7 w=l/e V 7 



w=l/e 

I ei a \ r, , mS ^-^ / ei a \ 

w 

\ w I 

w=l/e 

l/e T 

< (4e 2 T) (eTa) w + (eT) ^ (eTa) w (w 2 /w w < 1 for w e N) 

io=l w=l/z 

< 4e 2 T(2eTa) + 2(eTcj)" 1/£ er < 3e 2 T (eTV < 1/4) 

< 24e 2 (Ta)T + 4e 2 (Ta)T < 30e 2 (Ta)T 



Combining our bounds for the left and right sums in (1) completes the proof. 



□ 



32 



